20 research outputs found

    MEMO: Coverage-guided Model Generation For Deep Learning Library Testing

    Full text link
    Recent deep learning (DL) applications are mostly built on top of DL libraries. The quality assurance of these libraries is critical to the dependable deployment of DL applications. A few techniques have thereby been proposed to test DL libraries by generating DL models as test inputs. Then these techniques feed those DL models to DL libraries for making inferences, in order to exercise DL libraries modules related to a DL model's execution. However, the test effectiveness of these techniques is constrained by the diversity of generated DL models. Our investigation finds that these techniques can cover at most 11.7% of layer pairs (i.e., call sequence between two layer APIs) and 55.8% of layer parameters (e.g., "padding" in Conv2D). As a result, we find that many bugs arising from specific layer pairs and parameters can be missed by existing techniques. In view of the limitations of existing DL library testing techniques, we propose MEMO to efficiently generate diverse DL models by exploring layer types, layer pairs, and layer parameters. MEMO: (1) designs an initial model reduction technique to boost test efficiency without compromising model diversity; and (2) designs a set of mutation operators for a customized Markov Chain Monte Carlo (MCMC) algorithm to explore new layer types, layer pairs, and layer parameters. We evaluate MEMO on seven popular DL libraries, including four for model execution (TensorFlow, PyTorch and MXNet, and ONNX) and three for model conversions (Keras-MXNet, TF2ONNX, ONNX2PyTorch). The evaluation result shows that MEMO outperforms recent works by covering 10.3% more layer pairs, 15.3% more layer parameters, and 2.3% library branches. Moreover, MEMO detects 29 new bugs in the latest version of DL libraries, with 17 of them confirmed by DL library developers, and 5 of those confirmed bugs have been fixed.Comment: 11 pages, 8 figure

    Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting

    Full text link
    Automatically detecting software failures is an important task and a longstanding challenge. It requires finding failure-inducing test cases whose test input can trigger the software's fault, and constructing an automated oracle to detect the software's incorrect behaviors. Recent advancement of large language models (LLMs) motivates us to study how far this challenge can be addressed by ChatGPT, a state-of-the-art LLM. Unfortunately, our study shows that ChatGPT has a low probability (28.8%) of finding correct failure-inducing test cases for buggy programs. A possible reason is that finding failure-inducing test cases requires analyzing the subtle code differences between a buggy program and its correct version. When these two versions have similar syntax, ChatGPT is weak at recognizing subtle code differences. Our insight is that ChatGPT's performance can be substantially enhanced when ChatGPT is guided to focus on the subtle code difference. We have an interesting observation that ChatGPT is effective in inferring the intended behaviors of a buggy program. The intended behavior can be leveraged to synthesize programs, in order to make the subtle code difference between a buggy program and its correct version (i.e., the synthesized program) explicit. Driven by this observation, we propose a novel approach that synergistically combines ChatGPT and differential testing to find failure-inducing test cases. We evaluate our approach on Quixbugs (a benchmark of buggy programs), and compare it with state-of-the-art baselines, including direct use of ChatGPT and Pynguin. The experimental result shows that our approach has a much higher probability (77.8%) of finding correct failure-inducing test cases, 2.7X as the best baseline.Comment: Accepted to the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023

    Differential Spectral Normalization (DSN) for PDE Discovery

    No full text
    Partial differential equations (PDEs) play a prominent role in many disciplines for describing the governing systems of interest. Traditionally, PDEs are derived based on first principles. In the era of big data, the needs of uncovering PDEs from massive data-set are emerging and become essential. One of the latest advance in PDE discovery models is PDE-Net, which has shown promising predictive power with its moment-constrained convolutional filters, but may suffer from noisy data and numerical instability intrinsic in numerical differentiation. We propose a novel and robust regularization method tailored for moment-constrained convolutional filters, namely, Differential Spectral Normalization (DSN), to allow accurate estimation of coefficient functions and stable prediction of dynamics in a long time horizon. We investigated the effectiveness of DSN against batch normalization, dropout, spectral normalization, weight decay, weight normalization, jacobian regularization and orthonormal regularization and supported with empirical evidence that DSN owns the highest effectiveness by learning the convolutional filters in a robust manner. Numerical experiments further reveal that with DSN there is a substantial potential to uncover the hidden PDEs in a scarce data setting and predict the dynamical behavior for a long time horizon, even in a noisy environment where all data samples are contaminated with noise

    Diagnostic extended usefulness of RMI: comparison of four risk of malignancy index in preoperative differentiation of borderline ovarian tumors and benign ovarian tumors

    No full text
    Abstract Background This study aimed to examine the performance of the four risk of malignancy index (RMI) in discriminating borderline ovarian tumors (BOTs) and benign ovarian masses in daily clinical practice. Methods A total of 162 women with BOTs and 379 women with benign ovarian tumors diagnosed at the Second Affiliated Hospital of Harbin Medical University from January 2012 to December 2016 were enrolled in this retrospective study. Also, we classified these patients into serous borderline ovarian tumor (SBOT) and mucinous borderline ovarian tumor (MBOT) subgroup. Preoperative ultrasound findings, cancer antigen 125 (CA125) and menopausal status were reviewed. The area under the curve (AUC) of receiver operator characteristic curves (ROC) and performance indices of RMI I, RMI II, RMI III and RMI IV were calculated and compared for discrimination between benign ovarian tumors and BOTs. Results RMI I had the highest AUC (0.825, 95% CI: 0.790–0.856) among the four RMIs in BOTs group. Similar results were found in SBOT (0.839, 95% CI: 0.804–0.871) and MBOT (0.791, 95% CI: 0.749–0.829) subgroups. RMI I had the highest specificity among the BOTs group (87.6, 95% CI: 83.9–90.7%), SBOT (87.6, 95% CI: 83.9–90.7%) and MBOT group (87.6, 95% CI: 83.9–90.7%). RMI II scored the highest overall in terms of sensitivity among the BOTs group (69.75, 95% CI: 62.1–76.7%), SBOT (74.34, 95% CI: 65.3–82.1%) and MBOT (59.18, 95% CI: 44.2–73.0%) group. Conclusion Compared to other RMIs, RMI I was the best-performed method for differentiation of BOTs from benign ovarian tumors. At the same time, RMI I also performed best in the discrimination SBOT from benign ovarian tumors

    Multigene Profiling of Circulating Tumor Cells in Esophageal Squamous Cell Carcinoma Identifies Prognostic Cancer Driver Genes Associated with Epithelial-Mesenchymal-Transition Progression and Chemoresistance

    No full text
    We investigated the clinical significance of CTCs in cancer progression by detecting multiple cancer driver genes associated with epithelial-to-mesenchymal transition (EMT) at the transcript level. The 10-gene panel, comprising CCND1, ECT2, EpCAM, FSCN1, KRT5, KRT18, MET, TFRC, TWIST1, and VEGFC, was established for characterizing CTCs from mouse ESCC xenograft models and clinical ESCC peripheral blood (PB) samples. Correlations between gene expression in CTCs from PB samples (n = 77) and clinicopathological features in ESCC patients (n = 55) were examined. The presence of CTCs at baseline was significantly correlated with tumor size (p = 0.031). The CTC-high patients were significantly correlated with advanced cancer stages (p = 0.013) and distant metastasis (p = 0.029). High mRNA levels of TWIST1 (Hazard Ratio (HR) = 5.44, p = 0.007), VEGFC (HR = 6.67, p TFRC (HR = 2.63, p = 0.034), and EpCAM (HR = 2.53, p = 0.041) at baseline were significantly associated with a shorter overall survival (OS) in ESCC patients. This study also revealed that TWIST1 facilitates EMT and enhances malignant potential by promoting tumor migration, invasion, and cisplatin chemoresistance through the TWIST1-TGFBI-ZEB1 axis in ESCC, highlighting the prognostic and therapeutic potential of TWIST1 in clinical ESCC treatment

    Circulating Tumor Cell Enumeration for Serial Monitoring of Treatment Outcomes for Locally Advanced Esophageal Squamous Cell Carcinoma

    No full text
    We aim to reveal the clinical significance and potential usefulness of dynamic monitoring of CTCs to track therapeutic responses and improve survival for advanced ESCC patients. Peripheral blood (PB) (n = 389) and azygos vein blood (AVB) (n = 13) samplings were recruited prospectively from 88 ESCC patients undergoing curative surgery from 2017 to 2022. Longitudinal CTC enumeration was performed with epithelial (EpCAM/pan-cytokeratins/MUC1) and mesenchymal (vimentin) markers at 12 serial timepoints at any of the pre-treatment, all of the post-treatments/pre-surgery, post-surgery follow-ups for 3-year, and relapse. Longitudinal real-time CTC analysis in PB and AVB suggests more CTCs are released early at pre-surgery and 3-month post-surgery into the circulation from the CTRT group compared to the up-front surgery group. High CTC levels at pre-treatments, 1-/3-month post-surgery, unfavorable changes of CTC levels between all post-treatment/pre-surgery and 1-month or 3-month post-surgery (Hazard Ratio (HR) = 6.662, p < 0.001), were independent prognosticators for curative treatment. The unfavorable pre-surgery CTC status was independent prognostic and predictive for neoadjuvant treatment efficacy (HR = 3.652, p = 0.035). The aggressive CTC clusters were more frequently observed in AVB compared to PB. Its role as an independent prognosticator with relapse was first reported in ESCC (HR = 2.539, p = 0.068). CTC clusters and longitudinal CTC monitoring provide useful prognostic information and potential predictive biomarkers to help guide clinicians in improving disease management

    Circulating Tumor Cell Enumeration for Serial Monitoring of Treatment Outcomes for Locally Advanced Esophageal Squamous Cell Carcinoma

    No full text
    We aim to reveal the clinical significance and potential usefulness of dynamic monitoring of CTCs to track therapeutic responses and improve survival for advanced ESCC patients. Peripheral blood (PB) (n = 389) and azygos vein blood (AVB) (n = 13) samplings were recruited prospectively from 88 ESCC patients undergoing curative surgery from 2017 to 2022. Longitudinal CTC enumeration was performed with epithelial (EpCAM/pan-cytokeratins/MUC1) and mesenchymal (vimentin) markers at 12 serial timepoints at any of the pre-treatment, all of the post-treatments/pre-surgery, post-surgery follow-ups for 3-year, and relapse. Longitudinal real-time CTC analysis in PB and AVB suggests more CTCs are released early at pre-surgery and 3-month post-surgery into the circulation from the CTRT group compared to the up-front surgery group. High CTC levels at pre-treatments, 1-/3-month post-surgery, unfavorable changes of CTC levels between all post-treatment/pre-surgery and 1-month or 3-month post-surgery (Hazard Ratio (HR) = 6.662, p p = 0.035). The aggressive CTC clusters were more frequently observed in AVB compared to PB. Its role as an independent prognosticator with relapse was first reported in ESCC (HR = 2.539, p = 0.068). CTC clusters and longitudinal CTC monitoring provide useful prognostic information and potential predictive biomarkers to help guide clinicians in improving disease management
    corecore